Freeform SQL
Freeform SQL allows you to apply a complex condition across multiple tables or columns. You can also use it when you need more flexibility or customization of your rules.
Note Use the built-in Spark functions for non-Pushdown connections. Native SQL will not work, unless you switch to Native SQL rules. See (https://spark.apache.org/docs/2.3.0/api/sql/) for Simple and Freeform SQL rules.
Individual statement
Syntax
SELECT * FROM @<dataset_name> <table_alias>
WHERE <filter_expression>
GROUP BY <group_by_expression>
HAVING <having_expression>
The base of the statement is given with @<dataset_name> style. In general, the <dataset_name> is the same, where the rule is attached to, but you can generally use any valid dataset name in the expression.
Examples
Simple rule expression
opt.dataset = "example_ds"
val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleNm("is_city_not_null_or_empty")
rule.setRuleValue("select * from @example_ds t where t.amount > '5000'")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")
Complex rule expression
opt.dataset = "unique_rule_ds"
val rule = RuleBll.createRule(opt.dataset)
rule.setRuleNm("unique_rule")
rule.setRuleValue("select * from ( select count(*) as cnt, customer_id from @unique_rule_ds group by customer_id ) having cnt > 1")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")
RegExp expression
opt.dataset = "regexp_rule_ds"
val rule = RuleBll.createRule(opt.dataset)
rule.setRuleNm("LIKE_rule")
rule.setRuleValue("select * from @regexp_rule_ds.SYMBOL rlike '^ABB+' ")
rule.setRuleType("SQLG")
rule.setPerc(0.02)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")
Join statements
Available join types between multiple datasets
- WHERE tableA.id = tableB.id style
- INNER JOIN
- LEFT <OUTER> JOIN
- RIGHT <OUTER> JOIN
Joining other datasets
A @t1 rule is an out-of-the-box rule type that creates a dataset using the previous run’s date as the run date to read data from a multi-day run period so that data from the current and previous days can be evaluated.
To get the historical state of the same dataset:
Syntax: @t<n>, where the parameter n means, the historical run (number) of the base dataset (marked with @<dataset_name>) to instruct the temporary job created at runtime to target. The following table demonstrates what each of the following @tn value examples mean:
@tn value | Description |
---|---|
@t1 | Loads the data from the previous scan. |
@t2 | Loads the data from the second to last scan. |
@t3 | Loads the data from the third to last scan. |
@t4 | Loads the data from the fourth to last scan |
To get a different dataset:
Syntax: @<other_dataset_name>
Note When you use a rule with @t1 syntax, a temporary job in the Jobs queue appears. This temporary job displays a LOAD status to allow the run that the @t1 is targeting to load and use it for the rule.
Cross-Connection Libraries
Note When applying cross-connection rules please use the -addlib to submit the job with the appropriate jar files. In this example, a secondary set of jars is added through the Explorer. These files are located in the /opt/owl/drivers/mysql directory. The path should not contain double quotes or single quotes. It should point to a directory without spaces in the path.
WHERE style
Lookback dataset
SELECT * FROM @<dataset_name> <table_alias>, @t1 [<history_table_alias>]
WHERE <join_expression> AND <filter_expression>
Example
opt.dataset = "example_ds"
val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleValue("select * from @example_ds t, @t1 where t.customer_id = t1.customer_id and t.card_number <> t1.card_number ")
rule.setRuleType("SQLF")
Different dataset
SELECT * FROM @<dataset_name> <table_alias>, @<other_dataset_name> [<other_alias>]
WHERE <join_expression> AND <filter_expression>
Example
opt.dataset = "example_ds"
opt2.dataset = "other_ds"
val rule = RuleBll.createRule(opt.dataset)
rule.setRuleValue("select * from @example_ds t, @other_ds ds2 where t.customer_id = ds2.customer_id and t.card_number <> ds2.card_number ")
rule.setRuleType("SQLF")
LEFT JOIN
Example
opt.dataset = "example_ds"
val rule = OwlUtils.createRule(opt.dataset)
rule.setRuleNm("not_back2back_days")
rule.setRuleValue(" select * from @example_ds A LEFT OUTER JOIN @t1 B ON A.customer_id = B.customer_id where A.customer_id is not null and B.customer_id is null ")
rule.setRuleType("SQLF")
rule.setPerc(1.0)
rule.setPoints(1)
rule.setIsActive(1)
rule.setUserNm("admin")
Known Limitations
Freeform rules with fully qualified column names are currently unsupported when they use the following syntax:
select <column name> FROM @<dataset name> WHERE @<dataset name>.<column name> condition
A possible workaround to this limitation is to use aliasing instead.